On an Information Theoretic Approximation Measure for Functional Dependencies

نویسندگان

  • Chris Giannella
  • Edward Robertson
چکیده

We investigate the problem of de ning an approximation measure for functional dependencies (FDs). For xed sets of attributes, X and Y , an approximation measure is a function which maps relation instances to real numbers. The number to which an instance is mapped, intuitively, describes the strength of the dependency, X ! Y , in that instance. We de ne an approximation measure for FDs based on a connection between Shannon's information theory and relational database theory. Our measure is normalized to lie between zero and one (inclusive), and maps a relation instance to zero if and only if X ! Y holds in the instance. Hence, the smaller the number to which an instance is mapped, the \closer" X ! Y is to being an FD in the instance. To put our measure in context, we compare it to a slight variation of a measure previously de ned by Kivinen and Mannila, g3. We denote the variation as ĝ3, although, our results, essentially, apply unchanged to g3. The purpose of comparing our measure with ĝ3 is to develop a deeper understanding of not only our measure, but also, ĝ3. Moreover, we gain a deeper understanding of the natural intuitive notion of an approximate FD. We observe that our measure and ĝ3 agree at their extremes but are quite di erent in-between. As a result, we conclude that our measure and ĝ3 are signi cantly di erent. An interesting question emerges from this conclusion: is there a rigorous way to determine when one measure better captures the meaning of the degree to which an FD is approximate?

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Axiomatic Approach to Defining Approximation Measures for Functional Dependencies

We consider the problem of defining an approximation measure for functional dependencies (FDs). An approximation measure for X → Y is a function mapping relation instances, r, to non-negative real numbers. The number to which r is mapped, intuitively, describes the “degree” to which the dependency X → Y holds in r. We develop a set of axioms for measures based on the following intuition. The de...

متن کامل

Approximation Measures for Conditional Functional Dependencies Using Stripped Conditional Partitions

Received Apr 11, 2017 Revised May 5, 2017 Accepted May 24, 2017 Conditional functional dependencies (CFDs) have been used to improve the quality of data, including detecting and repairing data inconsistencies. Approximation measures have significant importance for data dependencies in data mining. To adapt to exceptions in real data, the measures are used to relax the strictness of CFDs for mor...

متن کامل

A New Similarity Measure Based on Item Proximity and Closeness for Collaborative Filtering Recommendation

Recommender systems utilize information retrieval and machine learning techniques for filtering information and can predict whether a user would like an unseen item. User similarity measurement plays an important role in collaborative filtering based recommender systems. In order to improve accuracy of traditional user based collaborative filtering techniques under new user cold-start problem a...

متن کامل

Information Dependencies Et Veritas Information Dependencies

This paper uses the tools of information theory to examine and reason about the information content of the attributes within a relation instance. For two sets of attributes, an information dependency measure (InD measure) characterizes the uncertainty remaining about the values for the second set when the values for the rst set are known. A variety of arithmetic inequalities (InD inequalities) ...

متن کامل

Design and adjustment of dependency measures

Dependency measures are fundamental for a number of important applications in data mining and machine learning. They are ubiquitously used: for feature selection, for clustering comparisons and validation, as splitting criteria in random forest, and to infer biological networks, to list a few. More generally, there are three important applications of dependency measures: detection, quantificati...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007